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Abstract: Lower bounds on mutual information (MI) of long-haul opti¬ 

cal fiber systems for hard-decision and soft-decision decoding are studied. 
Ready-to-use expressions to calculate the MI are presented. Extensive nu¬ 
merical simulations are used to quantify how changes in the optical trans¬ 
mitter, receiver, and channel affect the achievable transmission rates of the 
system. Special emphasis is put to the use of different quadrature amplitude 
modulation formats, channel spacings, digital back-propagation schemes 
and probabilistic shaping. The advantages of using MI over the prevailing 
Q-factor as a figure of merit of coded optical systems are also highlighted. 
OCIS codes: (060.4080) Modulation; (060.2330) Fiber optics communications. 
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1. Introduction 

The demand for increased data rates in optical long-haul communications has been growing 
for several years IT]. For currently deployed fibers, high-order modulation formats are one of 
the most popular alternatives to increase data rates. The decreased sensitivity of these formats 
conflicts with the requirement that the bit-error rate (BER) after decoding must be in the order of 
10 *^. One potential solution to meet this requirement is to use stronger forward error correction 
(FEC). These advanced FEC schemes typically operate with soft-decision (SD) decoders, i.e., 
the decoders are fed with soft-information (reliabilities) on the code bits. 

Codes with hard-decision (HD) decoding were the de-facto standard in optical long-haul 
systems El as they performed well with limited decoding complexity. Assuming ideal inter¬ 
leaving, the achievable rates of HD-FEC decoders can be fully determined by the pre-EEC 
BER (or equivalently the Q-f^ctor) |3] Sec. 6.5]. On the other hand, such a relationship be¬ 
tween pre-EEC BER and achievable rate does not exist for coded systems with SD decoding. 
This follows from the fact that SD-FEC decoders use information on the reliability of the bits 
rather than on HDs. Mutual information (MI) is an achievable transmission rate for SD-FEC 
decoders, and thus, a natural figure of merit to consider. 

In optical communications, MI has been used as a predictor of post-EEC BER that is more 
reliable than pre-EEC BER |4| and as a reference for the performance of capacity-achieving 
codes Q- It has also been used to analytically state lower-bound estimates on capacity |6l. 
In this paper, we are interested in obtaining a lower bound on MI from the actual output of 
an optical fiber channel. The objective is to use the bound to quantify achievable rates and 
compare them for varying system parameters using a Monte Carlo approach. MI estimation 
from samples of the optical channel has been investigated before. In El, an estimate of MI for 
fiber optics is found via computationally extensive simulations. In ring modulations for the 
optical channel are studied to estimate MI, from which a lower bound on capacity is obtained. 
|9| and ifTOl both study probabilistic shaping schemes with MI as figure of merit and compare 
their respective results with El. In ifTTl Sec. IV-B], equations for lower bounds on MI are 
presented for continuous channel inputs and quadrature phase-shift keying (QPSK), which are 
then used to verify a channel model. We follow a similar approach as in DU to obtain a lower 
bound, yet provide a simple and general expression for arbitrary discrete modulation formats 
and use it to analyze optical fiber systems. In this paper, we extend our previous work lEl 
by formally studying a bound rather than an estimate, and by considering a broader range of 
system parameters. The bound is obtained with circularly symmetric Gaussian noise statistics, 
which have been used for this purpose before, e.g., in El. We also consider achievable rates 
for nonuniform input distributions, which is an extension of M- In this work, we neglect 
any memory in the channel, which gives a lower bound on the MI of the true channel with 
memory 0 Sec. III-F]. We argue that this is a valid assumption as most practical receivers 
neglect memory by not operating on sequences of symbols but making decisions on a symbol- 
by-symbol basis. 

In this letter, we study achievable rates of a long-haul optical fiber systems. The main con¬ 
tribution is to present ready-to-use expressions for calculating a bound on MI that is based on 
the input and output symbols and obtained by assuming circularly symmetric Gaussian noise 
statistics. This allows us to quantify changes in data rate and spectral efficiency when system 
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Fig. 1. Block diagram of a coded communication system with SD (top) and HD (bottom) 
demapping and binary decoding. 


parameters or the digital signal processing (DSP) of the considered system are varied. We also 
examine MI for HD decoding and different modulation formats, channel spacings and nonuni¬ 
form input. 


2. Mutual Information Analysis 

2.7. System Model 

The coded communication system we consider in this paper is shown in Fig.[2 A binary encoder 
at the transmitter adds redundancy to the information bits b. The m streams of coded bits c,- are 
mapped onto symbols that are drawn from a discrete constellation with cardinality = M = 
2™ and probability mass function (PMF) Px(x). The sequence of symbols x is transmitted over 
a memoryless channel with distribution PY\x{y\x) and the continuous channel output y is input 
into a demapper. The top part of Fig. [^is an SD decoding system where the soft demapper 
uses more than two quantization levels. This soft information on the code bits is passed to the 
binary SD-FEC decoder. The bottom part of Fig. [T]shows the system with an HD demapper and 
a binary HD-FEC decoder. 


2.2. Mutual Information as Achievable Rate 

Let Y be the continuous complex output of a memoryless channel with discrete complex input 
X. The MI I{X ; Y ) represents the amount of information in bits per channel use (or equivalently, 
bits per symbol) about X that is contained in Y. The MI is defined as 


^{X;Y) = Y, Px{x) [ 

xeS' p 


PY{y) 


( 1 ) 


which is at most m bits per symbol (bit/sym). C denotes the set of complex numbers. The 
operational meaning of MI is an achievable transmission rate; for a fixed PME Px (x) and a fixed 
memoryless channel, it is the largest achievable rate. This means that when we transmit below 
this rate, coding schemes exist that allow the post-FEC BER to be made arbitrarily small. The 
converse is also true: An arbitrarily small post-EEC BER cannot be achieved when we transmit 
at a rate larger than the MI. Higher transmission rates require changing the PME Px (x) or the 
channel. Both of these options are analyzed in Sec. for a multi-span wavelength-division 
multiplexing (WDM) system. 



























































2.3. Lower Bounds on Mutual Information 

In order to calculate Eq. ([T]), the channel transition probability PY\x{y\^) must be known. Since 
no analytical expression exists for an optical fiber channel, we need to bound Eq. ([T]l. We 
obtain a lower bound on Eq. Q by using the mismatched decoding approach lfT4l Sec. VI] and 
consider an auxiliary channel with transition probability qY\x{y\x) instead of the memoryless 
PY\x{y\x)- The output distribution of the auxiliary channel is qxiy) = ILxeSt: QY\x{y\x)Px{x). We 
bound /(X; F) in Eq. Q as 

mV) > Rsd = E Px{x) [ PY\x{y\x)\og2 (2) 

xti- I 


Rsd is an achievable rate for a receiver with SD-FEC decoding as shown in Eig[^(top). In this 
work, the transition probability qY\x{y\x) is taken to be Gaussian with noise variance 
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We justify the assumption of white Gaussian noise statistics by the good agreement of fiber sim¬ 
ulations shown in ifT^ and the Gaussian noise model ifTSll . We approximate Rsd in Eq. (|^ using 
a Monte Carlo approach. Suppose we receive N distorted complex symbols yn,n= 1,..., V. We 
first estimate the noise variance from all received symbols. The conditional probability that 
a specific x was sent when we observe y„ follows from Eq. and Bayes’ theorem: 


qx\Y{x\yn) 


exp(-^^|^)RxW 


(4) 


We calculate qx\Y {x\yn) for all M • V combinations of potential x and observed y„. A ready-to- 
use expression for the lower bound on MI is then 


1 ^ 

RsD«- E Px{x)\og2Px{x) + E qx\Y{x\yn)\og2qx\Y{x\yn)- (5) 

xeSL n=\xes: 

Eor uniformly distributed input, Px{x) = and Eq. Q simplifies to 



The estimation accuracy of Eqs. Q and (|^ increases with N. 

Achievable rates for the binary HD-EEC shown in Fig. (bottom) are calculated by consid¬ 
ering two different EEC schemes. The first setup consists of m parallel binary encoder-decoder 
pairs, which means that there is a component code for each of the m binary sub-channels. We 
consider a multistage decoder setup with no information exchange between the component de¬ 
coders 1161 Sec. 6]. Let C;... Cm be binary random variables that are input into the mapper and 
C,... Cm the corresponding binary inputs into the binary decoder. An achievable rate for this 
design is denoted by Rj^p and defined as 
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where = —p,log 2 P, — (1 —/7,)log2(l — pi) is the binary entropy function and /?,■ is the 

BER of the i* binary symmetric channel with input C; and output C;. The second case we 
consider consists of one binary encoder-decoder pair that operates jointly on all m sub-channels 
and thus “sees” the average bit error rate p = Y,T= \Pi/ni. An achievable rate for this EEC scheme 
is denoted by and defined as 

R^jD^m.(l-//b(E))<RSD, (8) 

where the inequality follows from Jensen’s inequality iflTl Sec. 2.6]. 

3. Mutual Information Analysis for Long-Haul Optical Fiber Communications 

3.1. Simulation Setup 

The dual-polarization multi-span WDM simulation setup we consider here is shown in Eig. 
We generate 2*® Gray-labeled QAM symbols per polarization from a random bit sequence. 
A root-raised cosine (RRC) filter with 5% roll-off is used for pulse-shaping. The pulses are 
ideally converted into the analog and optical domain. The symbol rate is 28 GBaud and the 
signal bandwidth is 29.4 GHz. The number of simulated WDM channels varies with the 
channel spacing such that the total signal bandwidth is kept constant at 450 GHz. The above 
steps are repeated to generate the y-polarization. 

The simulated fiber link consists of spans of length 100 km, each followed by an Erbium- 
doped fiber amplifier (EDEA) with a noise figure of 4 dB. The fiber is single-mode fiber (SME) 
with a=0.2 dB/km, 7=1.3 (W km)"^ andD=17 ps/nm/km. Signal propagation is simulated using 
the split-step Eourier method with 32 samples per symbol. The step size is 100 m for the linear 
regime and decreased to 10 m for the nonlinear (high input power) regime. 

The incoming optical signal is filtered with an ideal optical band-pass filter and converted 
ideally into the digital domain. Either ideal electronic dispersion compensation (EDC) or ideal 
digital back-propagation (DBF) with the same step size and samples per symbol as for the for¬ 
ward propagation are applied. We either use single-channel (SC) DBF or multi-channel (MC) 
DBF of the full field. Neither equalization nor carrier phase recovery are required as polariza¬ 
tion mode dispersion and laser phase noise are not present. After matched filtering and down- 
sampling, Rsd is calculated from the received symbols of the center channel as outlined in 
Sec. |2.3| and averaged over both polarizations. The dashed box in Fig. [^depicts the channel 
for which Rsd is calculated. The channel contains not only the fiber itself but also most of the 
transmitter and receiver DSF Rj^p and R^q are calculated from the pre-EEC BERs of the center 
WDM channel after hard-decision demapping. 
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Fig. 2. Block diagram of the simulated optical system. The dashed box includes all compo¬ 
nents and subsystems that influence 1{X\Y). 









































Fig. 3. Achievable rates for soft- and hard-decision decoding: Rgo (solid), (dotted), 
RJjd (dashed) and different modulation formats: QPSK (green, x), 16-QAM (blue, dia¬ 
mond) and 64-QAM (red, triangle). EDC only, 6000 km SMF and Ach=15 WDM channels. 


3.2. Modulation Format 

In Fig. 1^ the achievable rates of three modulation formats vs. launch power are compared for 
Ach=15 WDM channels, EDC, Bch=30 GHz and 6000 km link length (As=60). When consid¬ 
ering the SD rate Rsd (solid lines), QPSK reaches a plateau that is very close to its maximum 
of 2 bit/sym. This suggests that QPSK is not the best choice for the input X and we should 
use a higher-order QAM. The gain in MI of 64-QAM over 16-QAM at their respective opti¬ 
mum power is only about 0.1 bit/sym. Furthermore, 64-QAM imposes stronger requirements 
on digital-to-analog converters and the optical signal processing algorithms than 16-QAM and 
is expected to suffer from a larger implementation penalty. We therefore conclude that 16-QAM 
is the optimal square QAM for the setup under consideration. 

The operational meaning of the maximum 16-QAM MI of 2.95 bit/sym is as follows. For 
the given setup and a receiver operating under the assumption of Gaussian and independent 
received symbols, at most 2.95 bits of the 4 bits of each 16-QAM symbol are available for 
transmitting information. An ideal FEC operating at this rate must have a coding rate of at most 
2.95/4=0.7375, which corresponds to a coding overhead of 35.6%. A higher transmission rate 
(or less overhead) can under no circumstances give a post-FEC BER in the order of 10'^^. 

Figure also depicts the results for HD decoding. Let us hrst compare Rj^p (dotted lines) 
and RJjq (dashed lines). For QPSK, R^q = RJjq as two orthogonal binary channels are ef¬ 
fectively considered. The rate improvement by having m encoder-decoder pairs is less than 
0.05 bit/sym for 16-QAM and about 0.24 bit/sym for 64-QAM. We observe that the difference 
between R^q and Rjjp grows with modulation order because averaging over more parallel sub¬ 
channels means that more information is lost. In the following, we consider only R^q as it is 
an achievable rate for the practically relevant case of one FEC encoder-decoder pair ||9l. 

Comparing Rsd and RJjq (dashed lines). Fig. [^illustrates that the gap between Rsd and RJjq 
grows with modulation order. For QPSK, the gap is 0.05 bit/sym, it increases to 0.5 bit/sym for 
16-QAM, and is up to 1 bit per symbol for 64-QAM. This is because the potential gain of SD 
over HD depends on the considered coding rate; it vanishes for high coding rates IS] Sec. 6.8]. 
The conclusion is that the closer a system is operated to its maximum transmission rate of m. 
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Fig. 4. Achievable rates Rsd (a) and dual-polarization SE (b) for 16-QAM and varying 
WDM channel spacings Bch- The total bandwidth over all WDM channels is constant at 
450 GFlz and the signal bandwidth per WDM channel is 29.4 GHz. 


the less beneficial SD decoding becomes. The importance of Ml becomes evident in this case 
as it allows us to draw the conclusion that, when we only consider the achievable rates and 
disregard practical aspects of modulation formats, Rsd suggests to use 64-QAM, while RJjq 
tells us that 16-QAM is the best choice. Note that R^q will eventually be larger for 64-QAM 
than for 16-QAM when 16-QAM is operated close to 4 bit/sym, see, e.g., M Fig. 2]. The 
2-factor does not allow this kind of analysis. 

3.3. WDM Spacing 

In Fig. (a), Rsd is evaluated for 16-QAM, EDC, 6000 km and Ach WDM channels spaced 
at Bch from 27.5 GHz to 50 GHz. For 30 GHz (dashed curve), which is close to the Nyquist 
rate of 29.4 GHz, 2.95 bit/sym are achievable at the optimal power. Increasing the channel 
spacing from 30 GHz to 50 GHz (solid curves) results in a larger rate of up to 3.15 bit/sym 
at the optimum power because the phase mismatch between co-propagating signals increases 
and the impact of four-wave mixing decreases. In the linear regime, the MI does not change for 
spacings from 30 GHz to 50 GHz. If the channel spacing is reduced to a sub-Nyquist 27.5 GHz 
(dotted line), the spectral overlap between adjacent WDM channels leads to additional noise 
that decreases Rsd also in the linear regime. 

In the context of constant signal bandwidth and varying WDM spacings, spectral efficiency 
(SE) must be considered as it directly relates to the net data rate of the system by taking into 
account the bandwidth usage. We calculate SE from the per-polarization rates Rsd of Eig.|^(a) 
as 2 • Rsd • 28 GBaud/Bch- We emphasize that we consider only the center channel, i.e., the one 
that experiences the strongest nonlinear interference. This is a worst-case scenario and higher 
rates might be achievable in other channels. Results are shown in Eig. |^(b). A maximum SE 
of 5.51 bit/s/Hz is obtained for the quasi-Nyquist spacing of 30 GHz. In this case, almost the 
entire spectrum of a WDM channel is used, which increases the SE more than the smaller MI 
decreases it. We conclude that there is no need to have large spectral guard bands to obtain the 
maximum SE of the center channel. 
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Fig. 5. Achievable rates for soft- and hard-decision decoding: Rso (solid) and rJjq (dashed) 
and different modulation formats: 16-QAM (blue) and 64-QAM (red). The simulation setup 
is identical to the one in Fig.[^ 


3.4. EDC, Single-Channel DBF and Multi-Channel DBF 

In Fig. 1^ EDC, SC DBF and MC DBF are compared for the same parameter set as for Fig. 

By replacing EDC with SC DBF, Rsd is increased by 0.18 bit/sym for both 16-QAM and 64- 
QAM. Similar gains are found for Rh^. 16-QAM with MC DBF and SD decoding gives a gain 
of 0.9 bit/sym over EDC. The plateau around 4 dBm suggests that larger gains are possible 
with 64-QAM. Our simulations confirm that an increase of 1.45 bit/sym is possible for 64- 
QAM when MC DBF is used instead of EDC. Note that the slopes of the curves in Fig. |^in 
the highly nonlinear regime are less steep than previously reported in ifT^ . This is explained 
by the fact that here we use a step size of 10 m while in ifT^ 100 m was used. A comparison 
of ideal SC DBF and MC DBF similar to Fig. |^was performed in ifT^ using the Q-f^ctor as 
figure of merit. The results in Gil are similar to the ones in Fig. For SD decoding, however, 
only a qualitative comparison of different DBF schemes is possible with the Q-f^ctor, and a 
relation to data rates gain cannot be made. For MC DBF, the difference between Rsd and RJjq 
is 0.17 bit/sym for 16-QAM and 0.8 bit/sym for 64-QAM. A practical interpretation of this 
is that for 16-QAM with MC DBF and the chosen system parameters, the extra complexity of 
SD-FEC might be spared because the potential improvement of SD-FEC over HD-FEC is small 
when RJjq is close to m. 

3.5. Frobabilistic Shaping for 64-QAM 

In this section, we choose an input X that is not uniformly distributed over Sf, but instead, 
we probabilistically shape the constellation using a heuristic scheme. As explained in ifTSll . 
the input FMF is chosen to be the Maxwell-Boltzmann distribution that maximizes MI for an 
additive white Gaussian noise channel under a power constraint. The rate Rsd in this case is 
shown in Fig. as a function of transmission distance. Each point in this plot is obtained for 
the optimum launch power and Ach=15 WDM channels. We see in Fig. |^that for 64-QAM 
and EDC only, transmission over additional 200 km is possible without sacrificing data rate 
by shaping the input. For distances above 1000 km, the shaping gain over uniform input lies 


















Fig. 6. Shaped 64-QAM outperforms uniform 64-QAM (both with EDC) in rate and dis¬ 
tance and gives similar gains as uniform input with SC DBF. The insets a) and b) show the 
shaped received constellation after 3 and 16 spans, respectively. 


between 0.1 bit/sym and 0.2 bit/sym, which is comparable to the gain by SC DBF at its optimal 
launch power. The fact that the employed heuristic shaping scheme performs as good as the 
computationally extensive ideal SC DBF shows the great potential of probabilistic shaping. 

4. Conclusion 

We have studied MI as a hgure of merit to gain insights into coded hber-optic systems. Unlike 
the 2-factor, MI represents an achievable rate, and thus, it can be directly connected to the 
spectral efficiency of the system. Using MI as the hgure of merit enables a more meaningful 
analysis as it allows to quantify changes in the spectral efficiency of the system. Although some 
conclusions drawn in Sec. apply specihcally to the presented conhguration, the application 
of the MI as a tool for system design is not limited to this conhguration, but can be extended to 
any device and algorithm of the optical channel, including transmitter and receiver DSF 
We have lower-bounded the true MI of the optical channel by making two assumptions. 
We have hrst neglected any memory and have then used the mismatched decoder approach 
with white Gaussian noise statistics. An interesting question is how much is to be gained by 
dropping these two simplifying assumptions. In particular, we believe that tighter lower bounds 
on MI can be found in systems for which the Gaussian noise assumption does not hold, such as 
single span systems, or at high launch powers. These questions are left for further investigation. 
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